An Adaptive LDA Optimal Topic Number Selection Method in News Topic Identification

نویسندگان

چکیده

Nowadays, news text information is exploding, and people need more heterogeneous content. Therefore, topic identification needed to help viewers quickly accurately screen filter related their interests save time energy. The Latent Dirichlet Allocation(LDA) model the most commonly used method for identification. optimal number of topics must be specified in advance when using LDA extract previous studies. However, selection too-large or too-small significantly impacts final results models, which directly determines quality extraction. Moreover, datasets from social media are very time-sensitive, combination temporal semantic modelling has not been considered past studies This paper proposes an adaptive determination fusion address existing problems. Semantic first extracted this as two different views. Then, density peak clustering multi-view performed based on obtained feature vectors. topics. To demonstrate effectiveness proposed method, compares performance four traditional methods determining with paper’s public datasets. show that considering factors better than other regarding F-value, PMI scores, MI scores. It performs well indicators well. above experimental combines data determine text, can improve accuracy selecting some extent. understand utilize massive information. In addition, also broadens idea identifying mining unique multiple perspectives.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

News Selection with Topic Modeling

There are numerous news articles coming to news aggregators and important news are selected to be presented on the front-page. There are two types of news selection for the front-page of news aggregators: personalized and public news recommendation (selection). This study examines public news recommendation that aims to satisfy all users’ interest on the front-page. Public news recommendation i...

متن کامل

Dynamic Threshold Selection Method for Multi-label Newspaper Topic Identification

Nowadays, the multi-label classification is increasingly required in modern categorization systems. It is especially essential in the task of newspaper article topics identification. This paper presents a method based on general topic model normalisation for finding a threshold defining the boundary between the “correct” and the “incorrect” topics of a newspaper article. The proposed method is ...

متن کامل

Topic detection in broadcast news

We propose a system for the Topic Detection and Tracking (TDT) detection task concerned with the unsupervised grouping of news stories according to topic. We use an incremental k-means algorithm for clustering stories. For comparing stories, we utilize a probabilistic document similarity metric and a traditional vector-space metric. We note that that the clustering algorithm requires two differ...

متن کامل

Topic extraction with multiple topic-words in broadcast-news speech

This paper reports on topic extraction in Japanese broadcastnews speech. We studied, using continuous speech recognition, the extraction of several topic-words from broadcast-news. A combination of multiple topic-words represents the content of the news. This is a more detailed and more flexible approach than using a single word or a single category. A topic-extraction model shows the degree of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3308520